Biological data structure having multi-lateral, multi-scalar, and multi-dimensional relationships between molecular features and other data

ABSTRACT

A computer system maintains a biological data structure having molecular feature data. The system receives data elements indicating biological molecular features and knowledge elements that represent biological concepts. The system individually associates unique identifiers with the elements. For individual elements, the system maintains an internal element set of the other unique identifiers for the other elements that are directly associated with that one individual element. For the individual elements, the system maintains an external element set of the other unique identifiers for the other elements that have that one individual element in their own internal element sets. Although not required, the computer system may process a query indicating a search scope and a molecular feature for an individual biological entity, and responsively process the molecular feature and the elements based on the search scope to induce a knowledge sub-graph for the individual biological entity.

RELATED CASES

This patent application claims the benefit of U.S. non-provisionalpatent application Ser. No. 13/463,603 that was filed on May 3, 2012 andis entitled “BIOLOGICAL DATA STRUCTURE HAVING MULTI-LATERAL,MULTI-SCALAR, AND MULTI-DIMENSIONAL RELATIONSHIPS BETWEEN MOLECULARFEATURES AND OTHER DATA,” which claims the benefit of U.S. provisionalpatent application 61/483,248 that was filed on May 6, 2011 and isentitled “COMPUTER SYSTEM AND METHOD TO AUTOMATE KNOWLEDGE RECOVERY,INFERENCE, AND LEARNING.” This patent application also claims thebenefit of U.S. provisional patent application 61/555,217 that was filedon Nov. 3, 2011 and is entitled “COMPUTER SYSTEM AND METHOD TO AUTOMATEKNOWLEDGE RECOVERY, INFERENCE, AND LEARNING.” This patent applicationalso claims the benefit of U.S. provisional patent application61/596,859 that was filed on Feb. 9, 2012 and is entitled “BIOLOGICALDATA STRUCTURE HAVING MULTI-LATERAL, MULTI-SCALAR, AND MULTI-DIMENSIONALRELATIONSHIPS BETWEEN MOLECULAR FEATURES AND OTHER DATA.” U.S.provisional patent applications 61/483,248, 61/555,217, and 61/596,859are hereby incorporated by reference into this patent application.

TECHNICAL BACKGROUND

Breakthroughs in genomic sequencing and analysis technologies aregenerating vast amounts of molecular feature data for both individualsand patient groups, such as a breast cancer patient group using aspecific drug. In addition, the treatments used to combat variousdiseases and medical conditions are also rapidly expanding. The nexus ofreadily-available genomics and advanced medical approaches has createdthe opportunity to provide personalized medicine where an individual'sown genetic data can be used to develop personalized treatments based onpast case histories, genetic records, and medical research.

Various approaches to data structuring have been proposed to supportpersonalized medicine based on individual genomic data. Object-orienteddata, relational databases, hyper-graphs, Bayesian networks, andhierarchical temporal memories are a few examples of such approaches.Unfortunately, these approaches do not relate knowledge and data in aneffective way to efficiently and robustly support personalized medicineat the molecular level.

Overview

A computer system maintains a biological data structure having molecularfeature data. The system receives data elements indicating biologicalmolecular features and knowledge elements that represent biologicalconcepts. The system individually associates unique identifiers with theelements. For individual elements, the system maintains an internalelement set of the other unique identifiers for the other elements thatare directly associated with that one individual element. For theindividual elements, the system maintains an external element set of theother unique identifiers for the other elements that have that oneindividual element in their own internal element sets. Although notrequired, the computer system may process a query indicating a searchscope and a molecular feature for an individual biological entity, andresponsively process the molecular feature and the elements based on thesearch scope to induce a knowledge sub-graph for the individualbiological entity.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a biological data system that includes a computersystem to maintain a biological data structure.

FIG. 2 shows a data structure to illustrate a data model implemented bya data processing system to maintain a biological data structure.

FIG. 3 further illustrates the data model implemented by the dataprocessing system to maintain the biological data structure.

FIG. 4 illustrates attribute relationships in the data model implementedby the data processing system to maintain the biological data structure.

FIG. 5 further illustrates the attribute relationships.

FIG. 6 illustrates function relationships in the data model implementedby the data processing system to maintain the biological data structure.

FIG. 7 further illustrates the function relationships.

FIG. 8 illustrates a knowledge sub-graph induced by an external/externalsecond order search through the biological data structure.

FIG. 9 illustrates a knowledge sub-graph induced by an internal/internalsecond order search through the biological data structure.

FIG. 10 illustrates a knowledge sub-graph induced by anexternal/internal second order search through the biological datastructure.

FIG. 11 illustrates a knowledge sub-graph induced by aninternal/external second order search through the biological datastructure.

FIG. 12 illustrates a bio-intelligence system including a public datastructure that uses the data model.

FIG. 13 illustrates a computer system to implement the data model tomaintain the biological data structure.

DETAILED DESCRIPTION

FIG. 1 illustrates biological data system 100 that includes computersystem 110 to maintain biological data structure 114. Biological datastructure 114 contains molecular feature data with a few examples beinggenes, gene variants, gene expression data, gene states, and the like.Computer system 110 comprises communication interface 111 and dataprocessing system 112. Data processing system 112 includes data storagesystem 113 which stores biological data structure 114.

Computer system 110 is comprised of computer circuitry, memory devices,software, and communication components. Note that communicationinterface 111 and data systems 112-113 may be integrated together on asingle platform or may be geographically distributed across multiplediverse computer and communication systems. Likewise, communicationinterface 111 and data systems 112-113 may individually comprise asingle platform or be geographically distributed across multiple diversecomputer and communication components.

In operation, communication interface 111 receives data elements 101that indicate biological molecular features. Communication interface 111also receives knowledge elements 102 that represent biological concepts,such as disease signatures, disease classifications, drug signatures,and drug classifications and the like. In addition, communicationinterface 111 receives other data 103 that may comprise other dataelements, knowledge elements, attributes, data processing functions, orvarious other types of data and instructions. For example, theadditional data elements might include drugs, drug states, diseases, anddisease states. Additional knowledge elements might include oncologytreatments, signaling pathways, nucleic acid repairs, and the like. Notethat the distinction between data elements and knowledge elements isarbitrary within biological data system 100, and the distinction is madeto help understand the full and robust capability of system 100.

Data processing system 112 individually associates a Universally UniqueIdentifier (UUID) with each one of data elements 101, knowledge elements102, and also with any additional data and/or knowledge elements inother data 103. The UUID should be unique within data system 100, and insome examples, the UUID is also unique across several disparate systems.For a scenario with many diverse systems, the UUIDs generated by anygiven system should be statistically universally unique across all ofthe systems to support data mergers and queries across the systems andto support data references in systems that are not suitably referenced.In some examples, data processing system 112 also associates uniqueidentifiers with individual attributes and/or data processing functions.Data storage system 113 stores the data elements in association withtheir UUIDs and other relationship data in biological data structure114.

For individual data elements 101, data processing system 112 maintainsan internal element set of the UUIDs for the other data and/or knowledgeelements that are directly associated with that individual data element.For individual knowledge elements 102, data processing system 112maintains an internal element set of the UUIDs for the other data and/orknowledge elements that are directly associated with that individualknowledge element. In a similar manner, data processing system 112 maymaintain similar internal element sets of UUIDs for the data andknowledge elements in other data 103. These direct internal associationsmay be indicated by system personnel, table look-ups, automated rulesets, or learning algorithms. For example, Bayesian belief propagationsystems, hierarchical temporal memories, and neural networks could beused to identify some of the internal relationships.

For individual data elements 101, data processing system 112 maintainsan external element set of the UUIDs for the other data and knowledgeelements that have that individual data element in their own internalelement set. For individual knowledge elements 102, data processingsystem 112 maintains an external element set of the UUIDs for the otherdata and knowledge elements that have that individual knowledge elementin their own internal element set. Likewise, data processing system 112may maintain similar external element sets of UUIDs for the data andknowledge elements in other data 103.

Note that the terms “internal” and “external” as used herein could bereplaced by other distinguishing terms as desired. For a given element,the “internal” set typically includes other elements that comprise orcharacterize that given element in the manner that pieces of datacomprise or characterize a knowledge concept. In the various elements,the “external” sets reflect these direct “internal” relationships.

In addition to maintaining biological data structure 114, computersystem 110 also processes queries to return knowledge results.Communication interface 111 receives query 104 that indicates molecularfeature data for an individual biological entity, such as a genevariation for a cancer patient. Data processing system 112 processes themolecular feature data from query 104 and the data elements in datastructure 114 to identify any of the data elements having correspondingbiological molecular features. Pattern matching, hierarchical temporalmemory, neural networks, or some other data processing technique couldbe used to identify the corresponding biological molecular features.

Data processing system 112 induces a knowledge sub-graph for theindividual biological entity based on the internal element sets and/orthe external element sets of the identified data elements having thecorresponding biological molecular features. In a first order search,the corresponding molecular feature elements and the first orderelements listed in their external and/or internal sets are returned. Ina second order search, the second order elements in the external and/orinternal sets of the first order elements are also returned. At a givenorder, the search may be external, internal, or both depending on thesearch scope. In this manner, sub-graphs are induced responsive to thesearch scope in query 104.

In biological data structure 114, the data elements may be associatedwith attributes and functions that have associated values and states. Inthese examples, data processing system 112 is configured to search datastructure 114 for specific attribute types and specific function typesincluding searching for specific attributes and functions types havingspecific values or states. The results of attribute/function searchingcould then be used to induce knowledge subgraphs as described herein.

Communication interface 111 transfers knowledge result 105 representingthe induced sub-graph for the individual biological entity. For example,computer system 110 may provide a knowledge sub-graph for a cancerpatient based on the patient's own specific gene variation, where thesub-graph indicates an invaluable collection of relevant data andknowledge that is specific to the patient at the molecular level.

FIGS. 2-7 show data set 200 to illustrate the data model that isimplemented by data processing system 112 to maintain a biological datastructure 114. Although specific types of data and knowledge elementsare shown on FIGS. 2-7, these specific elements are merely exemplary,and a multitude of other data and knowledge elements would typically beused. Thus, no real-world correlations of molecular features totreatments, drugs, or other elements is intended on FIGS. 2-7.

Data set 200 includes data elements 201-204 and knowledge elements211-213. On FIG. 2, direct relationships 251-259 are indicated by lineswith dots at one end. The element on the “dot” has the internal set, andthe other element on the line (no dot) is in that internal set. Forexample, the internal set for knowledge element 211 (oncology treatment“E”) includes the UUID for data element 201 (molecular feature “A”) asrepresented by direct relationship 253. Based on direct relationship253, the external element set for data element 201 includes the UUID forknowledge element 211.

Note how the relationship between elements 201 and 211 has a directedaspect in that element 211 relates itself directly to element 201 in itsown internal set. This relationship between elements 201 and 211 alsohas an undirected aspect in that element 201 relates back to element 211through its own external set in an undirected manner. Also note how dataand knowledge can be inter-related.

Knowledge element 213 is directly related to elements 203, 211, and 212by respective relationships 256, 258, and 259. Knowledge element 211 isdirectly related to elements 201 and 203 by respective relationships 253and 254. Knowledge element 212 is directly related to elements 204 and211 by respective relationships 255 and 257. Data elements 201-202 aredirectly related to each other by respective relationships 251-252.

FIG. 3 shows data set 200 to further illustrate the data model forbiological data structure 114. The internal element sets and theexternal element sets for elements 201-204 and 211-213 are now shownalong with their corresponding direct relationships 251-259.

Knowledge element 213 is directly related to elements 203, 211, and 212,and as a result, element 213 indicates the UUIDs for elements 203, 211,and 212 in its internal set. Knowledge element 211 is directly relatedto elements 201 and 203, and as a result, element 211 indicates theUUIDs for elements 201 and 203 in its internal set. Knowledge element212 is directly related to elements 204 and 211, and as a result,element 212 indicates the UUIDs for elements 204 and 211 in its internalset. Data elements 201-202 are directly related to each other, and as aresult, elements 201-202 indicate the UUID for each other in theirinternal sets.

In a reciprocal fashion, element 201 is in the internal sets of elements202 and 211, and as a result, element 201 indicates the UUIDs forelements 202 and 211 in its external set. Element 202 is in the internalset of element 201, and as a result, element 202 indicates the UUID forelement 201 in its external set. Element 203 is in the internal sets ofelements 211 and 213, and as a result, element 203 indicates the UUIDsfor elements 211 and 213 in its external set. Element 204 is in theinternal set of element 212, and as a result, element 204 indicates theUUID for element 212 in its external set. Element 211 is in the internalsets of elements 212 and 213, and as a result, element 211 indicates theUUIDs for elements 212 and 213 in its external set. Element 212 is inthe internal set of element 213, and as a result, element 212 indicatesthe UUID for element 213 in its external set.

FIG. 4 shows data set 200 to illustrate attribute relationships in thedata model implemented by data processing system 112 to maintainbiological data structure 114. Data set 200 now shows elements 201, 204,211, and 213, and attributes 405-407 have been added. Attributes 405-407each comprise an attribute identifier (ID), a data type, and a datavalue. Data processing system 112 may use UUIDs to identify theattributes or use some other type of ID technique. Data attributes couldcomprise any data values including age, lifestyle metrics, geographiclocation, ethnicity, gender, project, company, status, and the like.Although specific attribute types are shown on FIG. 4, these specificattributes are merely exemplary, and a multitude of other attributeswould typically be used.

On FIG. 4, attribute relationships 471-475 are also indicated by lineswith dots at one end. The element on the “dot” has an attribute set, andthe attribute on the line (no dot) is in that attribute set. Forexample, the attribute set for knowledge element 211 (oncology treatment“E”) includes the attribute ID for attribute 405 (non-smoker) asrepresented by attribute relationship 473. Based on attributerelationship 473, the element set for attribute 405 would include theUUID for knowledge element 211. Note how various data elements andknowledge elements may share attributes. Knowledge elements 211 and 213are both related to attribute 407 by respective attribute relationships474-475. Data element 201 and knowledge element 211 and are both relatedto attribute 405 by respective attribute relationships 471 and 473. Dataelement 204 is related to attribute 406 by respective attributerelationship 472.

If attribute searching is supported, then data processing system 112 isconfigured to search biological data structure 114 (including data set200) to identify specific attribute types or specific attribute typeshaving specific values. For example, attribute 407 would be identifiedin a search for the attribute type “FDA Approval” or in a search for theattribute type “FDA Approval” having the corresponding “N” value. Thesesearches may include combinations of elements and attributes, so asearch for all oncology treatment elements with an attribute type/valueof “FDA Approval N” would return knowledge element 211—“OncologyTreatment E.”

FIG. 5 shows data set 200 to further illustrate the attributerelationships in the data model for biological data structure 114. Theattribute sets for elements 201, 204, 211, and 213, and the element setsfor attributes 405-407 are now shown along with their correspondingattribute relationships 471-475. Knowledge elements 211 and 213 arerelated to attribute 407, and as a result, elements 211 and 213 eachindicate the ID for attribute 407 in their attribute set. Data element201 and knowledge element 211 are related to attribute 405, and as aresult, elements 201 and 211 each indicate the ID for attribute 405 intheir attribute set. Data element 204 is related to attribute 406, andas a result, element 204 indicates the ID for attribute 406 in itsattribute set.

In a reciprocal fashion, attribute 405 is in the attribute sets ofelements 201 and 211, and as a result, attribute 405 indicates the UUIDsfor elements 201 and 211 in its element set. Attribute 406 is in theattribute set of element 204, and as a result, attribute 406 indicatesthe UUID for element 204 in its element set. Attribute 407 is in theattribute sets of elements 211 and 213, and as a result, attribute 407indicates the UUIDs for elements 211 and 213 in its element set.

FIG. 6 shows data set 200 to illustrate function relationships in thedata model implemented by data processing system 112 to maintainbiological data structure 114. Data structure 200 now shows elements202, 203, 212, and 213, and functions 608-610 have been added. Functions608-610 each comprise a function identifier (ID), a function type, andfunction logic. Data processing system 112 may use UUIDs to identify thefunctions or use some other type of ID technique. Data functions couldbe event handlers, message triggers, or some other data processing task.Although specific function types are shown on FIG. 6, these specificfunctions are merely exemplary, and a multitude of other functions wouldtypically be used.

Data processing system 112 executes the data processing functionsdirectly associated with a data or knowledge element when it handlesthat element in data structure 114. For example, the knowledge elementfor a specific form of carcinoma may have a notice function to email akey research scientist whenever the carcinoma knowledge element ishandled in a specific context. In other cases, data processing system112 may invoke functions based on external events and conditions. Forexample, a given data element may have a delete function for 12/31/2018,and when data processing system 112 eventually receives the event thattoday is 12/31/2018, it searches for 12/31/2108 event functions andresponsively deletes the given data element from the system.

If function searching is supported, then data processing system 112 isconfigured to search biological data structure 114 (including data set200) to identify specific function types or specific function typeshaving specific values or states. For example, function 609 would beidentified in a search for the function type “Send Message” or in asearch for the function type “Send Message” having the corresponding “Z”value. These searches may include combinations of elements, attributes,and functions, so a search for all drug data elements with a functiontype/value of “Send Message Z” would return data element 203—“Drug C.”

On FIG. 6, function relationships 681-685 are also indicated by lineswith dots at one end. The element on the “dot” has a function set, andthe function on the line (no dot) is in that function set. For example,the function set for knowledge element 212 (signaling pathway “X”)includes the function ID for function 610 (increment counter “P”) asrepresented by function relationship 684. Based on function relationship684, the element set for function 610 would include the UUID forknowledge element 212. Note how various data elements and knowledgeelements may share functions. Knowledge element 212 and 213 are bothrelated to function 610 by respective function relationships 684-685.Data element 202 and knowledge element 212 and are both related tofunction 608 by respective function relationships 681 and 683. Dataelement 203 is related to function 609 by respective functionrelationship 682.

FIG. 7 shows data set 200 to further illustrate the functionrelationships in the data model for biological data structure 114. Thefunction sets for elements 202, 203, 212, and 213, and the element setsfor functions 608-610 are now shown along with their correspondingfunction relationships 681-685. Knowledge elements 212 and 213 arerelated to function 610, and as a result, elements 212 and 213 eachindicate the function ID for function 610 in their function set. Dataelement 202 and knowledge elements 212 are related to function 608, andas a result, elements 202 and 212 each indicate the ID for function 608in their function set. Data element 203 is related to function 609, andas a result, element 203 indicates the ID for function 609 in itsfunction set.

In a reciprocal fashion, function 608 is in the function set of elements202 and 212, and as a result, function 608 indicates the UUIDs forelements 202 and 212 in its element set. Function 609 is in the functionset of element 203, and as a result, function 609 indicates the UUID forelement 203 in its element set. Function 610 is in the function sets ofelements 212 and 213, and as a result, function 610 indicates the UUIDsfor elements 212 and 213 in its element set.

FIGS. 8-11 illustrate various search techniques implemented by dataprocessing system 112 to retrieve knowledge from biological datastructure 114. Note that these search technique are examples, and dataprocessing system 112 may use other search techniques. It should beappreciated that these techniques could be combined and modified invarious ways to provide a myriad of different search opportunities. Forclarity, the amount and complexity of the searches, data elements, andknowledge elements has been restricted on FIGS. 8-11.

FIG. 8 illustrates knowledge sub-graph 800 induced by a search throughbiological data structure 114, where the search scope is first orderexternal and second order external. Prior to inducing sub-graph 800,data processing system 112 receives query 104 indicating a biologicalmolecular feature for an individual cancer patient. Query 104 alsoindicates the search scope: first order external and second orderexternal. Responsive to query 104, data processing system 112 identifiesdata element 201 (molecular feature “A”) through molecular level patternmatching, attribute/function searching, or some other molecular featuresearch technique.

To induce sub-graph 800 responsive to the search scope, data processingsystem 112 initiates a first order external search by processing theexternal set of data element 201 to identify elements 202 and 211 andtheir corresponding first order relationships 251 and 253. For thesecond order external search, data processing system 112 processes theexternal sets of data element 202 and 211 from the first order search toidentify elements 201 and 212-213 and their corresponding second orderrelationships 252 and 257-258. Data processing system 112 transfersknowledge result 105 indicating sub-graph 800 in response to query 104.Note that the search paths from element 201 to elements 212-213 arereadily identifiable from knowledge result 105.

FIG. 9 illustrates knowledge sub-graph 900 induced by a search throughbiological data structure 114, where the search scope is first orderinternal and second order internal. Prior to inducing sub-graph 900,data processing system 112 receives query 104 indicating signalingpathway “X” based on a medical diagnosis for an individual cancerpatient. Query 104 also indicates the search scope: first order internaland second order internal. Responsive to query 104, data processingsystem 112 identifies data element 212 (signaling pathway “X”) from asemantic analysis of query 104, attribute/function searching, or someother query analysis technique.

To induce sub-graph 900 responsive to the search scope, data processingsystem 112 initiates a first order internal search by processing theinternal set of element 212 to identify elements 204 and 211 and theircorresponding first order relationships 255 and 257. For the secondorder internal search, data processing system 112 processes the internalsets of data elements 204 and 211 from the first order search toidentify elements 201 and 203 and their corresponding second orderrelationships 253-254. Data processing system 112 transfers knowledgeresult 105 indicating sub-graph 900 in response to query 104. Note thatthe search paths from element 212 to elements 201-203 are readilyidentifiable from knowledge result 105.

FIG. 10 illustrates knowledge sub-graph 1000 induced by a search throughbiological data structure 114, where the search scope is first orderexternal and second order internal. Prior to inducing sub-graph 1000,data processing system 112 receives query 104 indicating the biologicalmolecular features of an individual cancer patient. Query 104 alsoindicates the search scope: first order external and second orderinternal. Responsive to query 104, data processing system 112 identifiesdata element 201 (molecular feature “A”) through molecular level patternmatching, attribute/function searching, or some other molecular featuresearch technique.

To induce sub-graph 1000 responsive to the search scope, data processingsystem 112 initiates a first order external search by processing theexternal set of data element 201 to identify elements 202 and 211 andtheir corresponding first order relationships 251 and 253. For thesecond order internal search, data processing system 112 processes theinternal sets of data element 202 and 211 from the first order search toidentify elements 201 and 203 and their corresponding second orderrelationships 251 and 254. Data processing system 112 transfersknowledge result 105 indicating sub-graph 1000 in response to query 104.Note that the search paths from element 201 to elements 201 and 203 arereadily identifiable from knowledge result 105.

FIG. 11 illustrates knowledge sub-graph 1100 induced by a search throughbiological data structure 114, where the search scope is first orderinternal and second order external. Prior to inducing sub-graph 1100,data processing system 112 receives query 104 indicating the biologicalmolecular features of an individual cancer patient. Query 104 alsoindicates the search scope: first order internal and second orderexternal. Responsive to query 104, data processing system 112 identifiesdata element 202 (molecular feature “B”) through molecular level patternmatching, attribute/function searching, or some other molecular featuresearch technique.

To induce sub-graph 1100 responsive to the search scope, data processingsystem 112 initiates a first order internal search by processing theinternal set of data element 202 to identify element 201 and thecorresponding first order relationship 251. For the second orderexternal search, data processing system 112 processes the external setof data element 201 from the first order search to identify elements 202and 211 and their corresponding second order relationships 251 and 253.Data processing system 112 transfers knowledge result 105 indicatingsub-graph 1100 in response to query 104. Note that the search paths fromelement 202 to elements 202 and 211 are readily identifiable fromknowledge result 105.

Note that a full (internal and external) search could be performed atany given order by combining internal and external search results forthat order. Also note that different types of searches may be specifiedat the different orders—like a full first order full search combinedwith an internal second order search. Note that the search order couldbe increased or decreased as well, and a first order search, third ordersearch, tenth order search, or some other order search could beperformed used using the principles described herein. In addition, asearch may not be limited to a given order and may be allowed torecursively traverse the element sets in an indefinite manner. It may bedesirable to provide a user interface that allows the user to togglebetween various search inputs, search scopes, and sub-graphs and touncover relevant knowledge.

Also note that various rules could be applied to the data modeldescribed above. For example, a rule could be imposed that requires allelements to have an attribute with a value of “directed” or“undirected.” For elements with the directed attribute, another rulemight stipulate that their internal element sets are ordered lists. Inanother example, a rule and corresponding attribute value may force anelement to have an empty internal set, and thus, to behave like a nodein a hypergraph. In yet another example, a rule and correspondingattribute values of “node” or “edge” could be used to prevent “node”elements from having other attributes while allowing “edge” elements tohave attributes. For a directed hypergraph, a “mode” attribute could beused with various values and rules that force the desired directedhypergraph characteristics. For a relational database, the databasefields could be attributes, and the rules would enforce the desiredrelational database constraints.

In some examples, a control language may be used to search and maintainthe data structure. The language could have persistent commands tocreate, modify, or remove elements, attributes, and functions. Thelanguage could have commands to induce subgraphs, such as recover,context, and expand commands to respectively induce internal andexternal, external-only, or internal-only subgraphs. The language couldhave commands to control the orders of the search and a format thatallows different graph-induction approaches at each order of the search.

FIG. 12 illustrates bio-intelligence data system 1200 including a publicdata structure that is configured and operates like data structure 114.A public data processing system receives public drug data, medicalconcepts, and biological data (including public molecular feature data).The public data processing system formats and relates this public datain the public data structure as described above.

A private data processing system receives private medical and patientdata. The private data processing system submits a query through thepublic data processing system to the public data structure. Althoughbased on private patient data, the query is configured to maintainpatient privacy. For example, the private patient name may be replacedby an anonymous code in the query.

The public data processing system induces a knowledge subgraph from thepublic data structure responsive to the query. The public dataprocessing system transfers the knowledge subgraph to the private dataprocessing system. The private data processing system interface thenintegrates the private patient data with the knowledge subgraph toprovide a rich set of public and private patient data to facilitate amore personalized medical approach. In a similar manner, the privatedata processing system may integrate private patient data with knowledgesubgraphs to stratify patients for various private drug trials.

FIG. 13 illustrates computer system 1300 to implement the data model tomaintain biological data structure 1307 as described above. Computersystem 1300 provides an example of computer system 110, althoughcomputer system 110 could use alternative configurations. Computersystem 1300 comprises network transceiver 1301, user interface 1302, andprocessing system 1303. Processing system 1303 is linked to transceiver1301 and user interface 1302. Processing system 1303 includesmicro-processing circuitry 1304 and memory system 1305 that storessoftware 1306 and data structure 1307. Software 1306 includes softwaremodules 1311-1314. Computer system 1300 may include other well-knowncomponents such as power supplies and enclosures that are not shown forclarity.

Network transceiver 1301 comprises communication circuitry and softwarefor network communications. Network transceiver 1301 may use variousprotocols, such as Ethernet, Internet Protocol, and the like. Networktransceiver 1301 receives data elements, knowledge elements, userinstructions, and queries. Network transceiver 1301 transfers knowledgesubgraphs. User interface 1302 comprises displays, input keys, mousedevices, touch pads, and the like.

Micro-processing circuitry 1304 comprises integrated circuitry thatretrieves and executes software 1306 from memory system 1305 to maintaindata structure 1307. Memory system 1305 comprises one or morenon-transitory storage media, such as disk drives, flash drives, datastorage circuitry, or some other memory apparatus. Processing circuitry1304 is typically mounted on circuit boards that may also holdcomponents of memory system 1305, transceiver 1301, and user interface1302. Software 1306 comprises computer programs, firmware, or some otherform of machine-readable processing instructions. Software 1306 mayinclude operating systems, utilities, drivers, network interfaces,applications, or some other type of software.

When executed by micro-processing circuitry 1304, data intake module1311 directs processing system 1303 to receive and format data andknowledge elements for data structure 1307. When executed bymicro-processing circuitry 1304, user interface module 1312 directsprocessing system 1303 to process user instructions that specifyelements, attributes, functions, and element relationships for datastructure 1307. When executed by micro-processing circuitry 1304,maintenance module 1313 directs processing system 1303 to maintainelement, attribute, and function sets as described above. When executedby micro-processing circuitry 1304, query module 1314 directs processingsystem 1303 to induce subgraphs based on the information and searchscope in the queries and the user instructions. Although not shown onFIG. 13, software 1306 may also include pattern matching logic to matchbiological molecular features or other analytical software to indicateelement relationships.

The above examples deal with biological data and knowledge. Computersystems 110 and 1300 could also be operated to maintain and search withdifferent types of data utilizing the data model described herein. Thus,the above teachings could be deployed in other technical areas, such asgenealogy, demographics, or some other type of data structure or searchengine.

The above description and associated figures teach the best mode of theinvention. The following claims specify the scope of the invention. Notethat some aspects of the best mode may not fall within the scope of theinvention as specified by the claims. Those skilled in the art willappreciate that the features described above can be combined in variousways to form multiple variations of the invention. As a result, theinvention is not limited to the specific embodiments described above,but only by the following claims and their equivalents.

What is claimed is:
 1. A method of operating a computer system tomaintain a biological data structure having molecular feature data, themethod comprising: receiving data elements indicating biologicalmolecular features and receiving knowledge elements that representbiological concepts; individually associating unique identifiers withthe elements; for individual ones of the elements, maintaining aninternal element set of the other unique identifiers for the otherelements that are directly associated with that one individual element;and for the individual ones of the elements, maintaining an externalelement set of the other unique identifiers for the other elements thathave that one individual element in their own internal element sets. 2.The method of claim 1 wherein: the data elements indicating thebiological molecular features indicate at least one of: genes, genevariants, gene expression data, and gene states; and further comprisingreceiving additional data elements indicating at least one of: drugs,drug states, diseases, and disease states.
 3. The method of claim 1wherein the biological concepts comprise at least one of: diseasesignatures, disease classifications, drug signatures, drugclassifications, signaling pathways, and nucleic acid repairs.
 4. Themethod of claim 1 further comprising: receiving data attributesindicating values; individually associating attribute identifiers withthe attributes; for individual ones of the attributes, maintaining anelement set of the unique identifiers for the elements that are directlyassociated with that one individual attribute; and for the individualones of the elements, maintaining an attribute set of the attributeidentifiers for the attributes that are directly associated with thatone individual element.
 5. The method of claim 1 wherein the dataattributes comprise at least one of: age, geographic location,ethnicity, and gender.
 6. The method of claim 1 further comprising:receiving data functions representing data processing instructions;individually associating function identifiers with the functions; forindividual ones of the functions, maintaining an element set of theunique identifiers for the elements that are directly associated withthat one individual function; for the individual ones of the elements,maintaining a function set of the function identifiers for the functionsthat are directly associated with that one individual element.
 7. Themethod of claim 1 wherein the data functions comprise at least one of:event handlers, message triggers, and counters.
 8. The method of claim 1further comprising: receiving a data query indicating the molecularfeature data for an individual biological entity and a search scopeindicator; processing the molecular feature data and the data elementsto identify ones of the data elements having corresponding biologicalmolecular features; if the search scope indicator comprises an internaland external scope, then generating a knowledge sub-graph for theindividual biological entity based on both the internal element sets andthe external element sets of the identified data elements having thecorresponding biological molecular features; if the search scopeindicator comprises an internal scope, then generating the knowledgesub-graph for the individual biological entity based on the internalelement sets but not the external element sets of the identified dataelements having the corresponding biological molecular features; and ifthe search scope indicator comprises an external scope, then generatingthe knowledge sub-graph for the individual biological entity based onthe external element sets but not the internal element sets of theidentified data elements having the corresponding biological molecularfeatures.
 9. The method of claim 1 further comprising: receiving a dataquery indicating the molecular feature data for an individual biologicalentity and a search scope indicator; processing the molecular featuredata and the data elements to identify ones of the data elements havingcorresponding biological molecular features; if the search scopeindicator comprises a second order scope, then generating a knowledgesub-graph for the individual biological entity based on the first orderelement sets of the identified data elements having the correspondingbiological molecular features and also based on the second order elementsets of the other data elements indicated in the first order elementsets of the identified data elements having the corresponding biologicalmolecular features.
 10. The method of claim 1 further comprising:receiving a data query indicating the molecular feature data for anindividual biological entity and a search scope indicator that indicatesa number of orders and an external/internal search indicator for each ofthe orders; processing the molecular feature data and the data elementsto identify ones of the data elements having corresponding biologicalmolecular features; generating a knowledge sub-graph for the individualbiological entity based the number of orders, the external/internalsearch indicator for each of the orders, and the data elements havingcorresponding biological molecular features.
 11. A computer apparatus tooperate a computer system to maintain a biological data structure, thecomputer apparatus comprising: at least one non-transitorycomputer-readable media having computer-useable instructions embodiedthereon; the computer-useable instructions configured to direct thecomputer system, when executed by the computer system, to receive dataelements indicating biological molecular features and receive knowledgeelements that represent biological concepts, to individually associateunique identifiers with the elements, to maintain, for individual onesof the elements, an internal element set of the other unique identifiersfor the other elements that are directly associated with that oneindividual element, and to maintain, for the individual ones of theelements, an external element set of the other unique identifiers forthe other elements that have that one individual element in their owninternal element sets.
 12. The computer apparatus of claim 11 whereinthe data elements indicating the biological molecular features indicateat least one of: genes, gene variants, gene expression data, and genestates, and wherein the computer-useable instructions are configured todirect the computer system to receive additional data elementsindicating at least one of: drugs, drug states, diseases, and diseasestates.
 13. The computer apparatus of claim 11 wherein the biologicalconcepts comprise at least one of: disease signatures, diseaseclassifications, drug signatures, drug classifications, signalingpathways, and nucleic acid repairs.
 14. The computer apparatus of claim11 wherein the computer-useable instructions are configured to directthe computer system to: receive data attributes indicating values;individually associate attribute identifiers with the attributes; forindividual ones of the attributes, maintain an element set of the uniqueidentifiers for the elements that are directly associated with that oneindividual attribute; and for the individual ones of the elements,maintain an attribute set of the attribute identifiers for theattributes that are directly associated with that one individualelement.
 15. The computer apparatus of claim 11 wherein the dataattributes comprise at least one of: age, geographic location,ethnicity, and gender.
 16. The computer apparatus of claim 11 whereinthe computer-useable instructions configured to direct the computersystem to: receive data functions representing data processinginstructions; individually associate function identifiers with thefunctions; for individual ones of the functions, maintain an element setof the unique identifiers for the elements that are directly associatedwith that one individual function; for the individual ones of theelements, maintain a function set of the function identifiers for thefunctions that are directly associated with that one individual element.17. The computer apparatus of claim 11 wherein the data functionscomprise at least one of: event handlers, message triggers, andcounters.
 18. The computer apparatus of claim 11 wherein thecomputer-useable instructions are configured to direct the computersystem to: receive a data query indicating the molecular feature datafor an individual biological entity and a search scope indicator;process the molecular feature data and the data elements to identifyones of the data elements having corresponding biological molecularfeatures; if the search scope indicator comprises an internal andexternal scope, then generate a knowledge sub-graph for the individualbiological entity based on both the internal element sets and theexternal element sets of the identified data elements having thecorresponding biological molecular features; if the search scopeindicator comprises an external scope, then generating the knowledgesub-graph for the individual biological entity based on the externalelement sets but not the internal element sets of the identified dataelements having the corresponding biological molecular features; and ifthe search scope indicator comprises an internal scope, then generatingthe knowledge sub-graph for the individual biological entity based onthe internal element sets but not the external element sets of theidentified data elements having the corresponding biological molecularfeatures.
 19. The computer apparatus of claim 11 wherein thecomputer-useable instructions configured to direct the computer systemto: receive a data query indicating the molecular feature data for anindividual biological entity and a search scope indicator; process themolecular feature data and the data elements to identify ones of thedata elements having corresponding biological molecular features; if thesearch scope indicator comprises a second order scope, then generate aknowledge sub-graph for the individual biological entity based on thefirst order element sets of the identified data elements having thecorresponding biological molecular features and also based on the secondorder element sets of the other data elements indicated in the firstorder element sets of the identified data elements having thecorresponding biological molecular features.
 20. The computer apparatusof claim 11 wherein the computer-useable instructions configured todirect the computer system to: receive a data query indicating themolecular feature data for an individual biological entity and a searchscope indicator that indicates a number of orders and anexternal/internal search indicator for each of the orders; process themolecular feature data and the data elements to identify ones of thedata elements having corresponding biological molecular features;generate a knowledge sub-graph for the individual biological entitybased the number of orders, the external/internal search indicator foreach of the orders, and the data elements having the correspondingbiological molecular features.